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Abstract — This paper focuses on the design of medium access 
control protocols for cognitive radio networks. The scenario in 
which a single cognitive user wishes to opportunistically exploit 
the availability of empty frequency bands within parts of the 
radio spectrum having multiple bands is first considered. In this 
scenario, the availability probability of each channel is unknown 
a priori to the cognitive user. Hence efficient medium access 
strategies must strike a balance between exploring (learning) 
the availability probability of the channels and exploiting the 
knowledge of the availability probability identified thus far. For 
this scenario, an optimal medium access strategy is derived and 
its underlying recursive structure is illustrated via examples. To 
avoid the prohibitive computational complexity of this optimal 
strategy, a low complexity asymptotically optimal strategy is 
developed. Next, the multi-cognitive user scenario is considered 
and low complexity medium access protocols, which strike an 
optimal balance between exploration and exploitation in such 
competitive environments, are developed. 

I. Introduction 

Recently, the opportunistic spectrum access problem has 
been the focus of significant research activities [1]. The idea 
is to allow unlicensed users (i.e., cognitive users) to access 
the available spectrum when the licensed users (i.e., primary 
users) are not active, thus to increase the spectral efficiency of 
the existing wireless networks. The presence of high priority 
primary users and the requirement that the cognitive users 
should not interfere with them define a new medium access 
paradigm that we refer to as cognitive medium access. The 
goal of the current work is to develop a unified framework for 
the design of efficient, and low complexity, cognitive medium 
access protocols. 

The spectral opportunities available to cognitive users are 
by their nature time-varying on different time-scales. For 
example, on a small scale, multimedia data traffic of the 
primary users will tend to be bursty [2]. On a large scale, one 
would expect the activities of each user to vary throughout the 
day. Therefore, to avoid interfering with the primary network, 
cognitive users must first probe to determine whether there are 
primary activities before transmission. Under the assumption 
that each cognitive user cannot access all of the available 
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channels simultaneously [3]-[6], the main task of the medium 
access protocol is to distributively choose which channels each 
cognitive user should attempt to use in different time slots, in 
order to fully (or maximally) utilize the spectral opportunities. 
The statistical information about the primary users' traffic 
will be useful for this decision process. For example, with 
a single cognitive user capable of accessing (sensing) only 
one channel at each time slot, the problem becomes trivial if 
the probability that each channel is free is known a priori. In 
this case, the optimal rule is for the cognitive user to access 
the channel with the highest probability of being free in all 
time slots. However, such time- varying traffic information is 
typically not available to the cognitive users a priori. The 
need to learn this information on-line creates a fundamental 
tradeoff between exploitation and exploration. Exploitation 
refers to the short-term gain resulting from accessing the 
channel with the estimated highest probability of being free 
(based on the results of previous sensing results) whereas 
exploration is the process by which a cognitive user learns the 
statistical behavior of the primary traffic (by choosing possibly 
different channels to probe across time slots). In the presence 
of multiple cognitive users, the medium access algorithm must 
also account for the competition between different users over 
the same channel. 

In this paper, we develop a unified framework for the 
design and analysis of cognitive medium access protocols. 
This framework allows for the construction of strategies that 
strike an optimal balance among exploration, exploitation 
and competition. Tools from reinforcement machine learning 
are exploited to develop optimal cognitive medium access 
protocols for the cognitive radio networks. More specifically, 
we consider the following scenarios in this paper. In the 
first scenario, we assume the existence of a single cognitive 
user capable of accessing only a single channel in each time 
slot. In this setting, we derive an optimal sensing rule that 
maximizes the expected throughput obtained by the cognitive 
user. Compared with a genie-aided scheme, in which the 
cognitive user knows a priori the primary network traffic 
information, there is a throughput loss suffered by any medium 
access strategy. We obtain a lower bound on this loss and 
further construct a linear complexity single index protocol that 
achieves this lower bound asymptotically (when the primary 
traffic behavior changes very slowly). In the second scenario, 
we design distributed sensing rules for the scenario in which 
there are multiple cognitive users. The cognitive users must 



also take the competition from other cognitive users into 
consideration when making sensing decisions. With different 
assumptions on prior information available at the cognitive 
users, we develop optimal distributed sensing strategies and 
characterize the performance loss of these strategies compared 
with the optimal centralized scheme. 

The rest of this paper is organized as follows. Our network 
model is detailed in Section II. Section III analyzes the 
scenario in which there is only a single cognitive user. The 
extension to the multi-user case is reported in Section IV. 
Finally, Section V summarizes our conclusions and points out 
several possible future directions. Due to space limitation, we 
omit the proofs of the results presented in this paper. Interested 
readers can refer to [7] for details. 

II. Network Model 

Figure 1 shows the channel model under study. We consider 
a primary network consisting of TV non-overlapping channels, 
Af = {1, • • • , N}, each with bandwidth B w . The users in the 
primary network are operated in a synchronous time-slotted 
fashion. We assume that at each time slot, channel i is free 
with probability Bi. Let Zi(j) be a random variable that equals 
1 if channel i is free at time slot j and equals otherwise. 
Hence, given Bi, Zi(j) is a Bernoulli random variable with 
probability density function (pdf) 

M*0')) = W) + (l- 0i)5(O), 

where 5(-) is a delta function. Furthermore, for a given 9 = 
(#!,•■• ,Bn), Zi(j)s are independent for each i and j. We 
consider a block varying model in which the value of 9 is 
fixed for a block of T time slots and then randomly changes 
at the beginning of the next block according to a joint pdf 
f(0)- 

t=1 t=T 
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Fig. 1. Channel model. 

In our model, the cognitive users attempt to exploit the 
availability of free channels in the primary network by sensing 
the activity at the beginning of each time slot. Our work seeks 
to characterize efficient strategies for choosing which channels 
to sense (access). The challenge here stems from the fact that 
the cognitive users are assumed to be unaware of the exact 
value of 9 a priori. We consider two cases in which a cognitive 
user either has or does not have prior information about the pdf 
of 9, i.e., f(9). To further illustrate the point, let us consider 
our first scenario in which a single cognitive user is capable 



of sensing only one channel at each time slot. At time slot j, 
the cognitive user selects one channel S(j) £ Af to sense. 
If the sensing result shows that channel S(j) is free, i.e., 
Zs(j)(j) — 1> tne cognitive user can send B bits over this 
channel; otherwise, the cognitive user will wait until the next 
time slot and select a possibly different channel to sense. The 
number of bits that a cognitive user is able to send over a 
block with T slots is 

T 
3 = 1 

W is a random variable that depends on the traffic in the 
primary network and, more importantly for us, on the medium 
access protocols employed by the cognitive user. Therefore, 
the overarching goal of Section III is to construct low 
complexity medium access protocols that maximize E-jT^}. 
Intuitively, the cognitive user would like to select the channel 
having the highest probability of being free in order to obtain 
more transmission opportunities. If 9 is known then this 
problem is trivial: the cognitive user should choose the channel 
i* = argmaxf^ to sense. The uncertainty in 9 imposes a 

fundamental tradeoff between exploration, in order to learn 
9, and exploitation, by accessing the channel with the highest 
estimated availability probability based on current information 
gathered through sensing, as detailed in the following sections. 

III. Single User Scenario 

We start by developing an optimal solution to the single 
user cognitive user scenario. We can model our single user 
cognitive medium access problem as a bandit problem, a class 
of problems studied in reinforcement machine learning. In a 
typical setting, a decision maker must sequentially choose 
one process to observe from N > 2 stochastic processes, 
which have parameters that are unknown to the decision maker. 
Associated with each observation is a utility function. The 
objective of the decision maker is to maximize the sum or 
discounted sum of the utilities via a strategy that specifies 
which process to observe for every possible history of selec- 
tions and observations. A comprehensive treatment covering 
different variants of bandit problems can be found in [8] — [ 11]. 

A. Optimal Solution for the General Case 

The cognitive user employs a medium access strategy T, 
which will select channel S(j) E Af to sense at time slot j 
for any possible causal information pattern obtained through 
the previous j — 1 observations: 

*0') = Wi)»*.(i)(i), ••• > s (i - i),*.«-uO' - > 2 - 

i.e. s(j) = r(/, ^(j)). Notice that z 8 (j\{j) is the sensing 
outcome of the jth time slot, in which s(j) is the channel 
being accessed. If j — 1, there is no accumulated information, 
and thus ^(1) = <fi an d s (l) = r(/). We denote the expected 
value of the payoff obtained by a cognitive user who uses 
strategy T as Wr — E/{W}, where W is defined in Section II. 



We further denote 

V*(f,T)= S upW r , 
r 

which is the largest throughput that the cognitive user could 
obtain when the spectral opportunities are governed by f(0) 
and the exact value of each realization of 9 is not known 
by the user a priori. Each medium access decision made by 
the cognitive user has two effects. The first one is the short 
term gain, i.e., an immediate transmission opportunity if the 
chosen channel is found free. The second one is the long 
term gain, i.e., the updated statistical information about ,f{9). 
This information will help the cognitive user in making better 
decisions in future stages. There is an interesting tradeoff 
between the short and long term gains. If we only want to 
maximize the short term gain, we can pick the one with 
the highest estimated free probability to sense, based on the 
current information. This myopic strategy maximally exploits 
the existing information. On the other hand, by picking other 
channels to sense, we gain valuable statistical information 
about f{0) that can effectively guide future decisions. This 
process is typically referred to as exploration, as noted previ- 
ously. 

More specifically, let fi (0) be the updated pdf after making 
j - 1 observations. We begin with = f(0). After 

observing z s ^(j), we update the pdf using the following 
Bayesian formula. 

if ^o)(i) = i, 



solution is to choose the channel i having largest Ef{BZi}, 
which can be calculated as 



f i+1 (0) = 



J8 su) p(0)d0> 



if ««(i)C?) = °' 



f j+ \0) 



1-0. 



P(0) 



f(l-Os {j ))P(O)d0- 



(1) 



(2) 



The following result characterizes the optimal medium 
access control protocols. 

Lemma 1: For any prior pdf /, the following condition 
specifies V* and the optimal strategy T*: 

V*(f,T) = mn^Ef {BZ s(1) + V* (f Zs{1) , T - l) } , (3) 

where fz s(1) is the conditional pdf updated using the Bayesian 
formula, as if the cognitive user chooses s(l) and observes 
Z s (i). Also, V* (fz s(1) , T — l) is the value of a bandit 
problem with prior information fz s(1) and T — 1 sequential 
observations. □ 
In principle, Lemma 1 provides the solution that maximizes 
Wr. Effectively, it decouples the calculation at each stage, 
and hence, allows the use of dynamic programming to solve 
the problem. The idea is to solve the channel selection 
problem with a smaller dimension first and then use backward 
deduction to obtain the optimal solution for a problem with a 
larger dimension. Starting with T = 1, the second term inside 
the expectation in (3) is 0, since T — 1 = 0. Hence, the optimal 



E f {BZi} = B Jej(0)d0. 



maxEf \BZi\. 
With the solution for T 



And V*(f,l) 

1 at hand, we can now solve the 
T = 2 case using (3). At first, for every possible choice of 
s(l) and possible observation 2 s (i), we calculate the updated 
pdf f Zs(1) using the Bayesian formula. Next, we calculate 
V*(f z s(1) , 1) (which is a T = 1 problem with updated pdf 
jz 3(1) ). Finally, applying (3), we have the following equation 
for the channel selection problem with T = 2: 

V*(f,2) = max f [B9, + (f Zt=1 ,l) 

+(l-6i)V*(f Zi = o ,l)]f(e)d0. 

Correspondingly, the optimal solution is = 
argmax V*(f, 2), i.e., in the first step, the cognitive 

user should choose = argmax V*(f,2) to sense. After 

observing 2j*(i), the cognitive user has = Zi*(i)}, 
and it should choose i*{2) = argmax7*(/ 2i , 1) implying 

that r*(/,*(l)) = argmax V*{f Zi , m ,l). 

Similarly, after solving the T = 2 problem, one can proceed 
to solve the T = 3 case. Using this procedure recursively, we 
can solve the problem with T — 1 observations. Finally, our 
original problem with T observations is solved as follows. 

V*(f,T) = max ( [B6 t + 6 t V* {f Zl=1 ,T - 1) 
ieN J 

+(l-O i )V*(f Zi=o ,T-l)]f(0)d0. 

The optimal solution presented above can be simplified 
when f{0) has a certain structure, as illustrated by the 
following examples. 

Example 1: (One Known Channel) We have N — 2 chan- 
nels with independent primary traffic distributions. Moreover, 
6*2 is known. The traffic pattern of channel 1 is unknown, and 
the probability density function of 9\ is given by fi(Oi). Since 
channel 2 is known and is independent of channel 1, sensing 
channel 2 will not provide the cognitive user with any new 
information. Hence, once the cognitive user starts accessing 
channel 2 (meaning that at a certain stage, sensing channel 2 
is optimal), there would be no reason to return to channel 1 
in the optimal strategy. A generalized version of this assertion 
was first proved in Lemma 4.1 of [12]. Restating the strategy 
in our channel selection setup, we have the following lemma. 

Lemma 2: In the optimal medium access strategy, once the 
cognitive user starts accessing channel 2, it should keep pick- 
ing the same channel in the remaining time slots, regardless 
of the outcome of the sensing process. □ 

This lemma essentially converts the channel selection prob- 
lem to an optimal stopping problem [13], where we only need 
to focus on the strategies that decide at which time-slot we 



should stop sensing channel 1, if it is ever accessed. The 
following lemma derives the optimal stopping rule. 

Lemma 3: For any and any T, if 9 2 > A(/i,T), 

then we should sense channel 2. Here 



A(/i,T) 



max 
r(/i)=i 



% {E^i^i(i)} 



(4) 



E A {M} 

where T are the set of strategies that start with channel 1 and 
never switch back to channel 1 after selecting channel 2; and 
M is a random number that represents the last time slot in 
which channel 1 is sensed, when the cognitive user follows a 
strategy in T. 

One can now combine Lemma 2 and Lemma 3 to obtain the 
following optimal strategy. 

1) At any time slot j, if channel 2 was sensed at time slot 
j — 1, keep sensing channel 2. 

2) If channel 1 was sensed at time slot j — 1, update the 
pdf p using (1) and (2) and compute A(f{ , T - j + 1) 
using (4). If A(ff,T - j + 1) < 9 2 , switch to channel 
2; otherwise, keep sensing channel 1. □ 

Example 2: (Independent Channels) 

N 

We have N independent channels with f(0) — f] fi(9i). 

i=l 

This case has a simple form of solution in the asymptotic 
scenario T —> oo assuming the following discounted form for 
the utility function 



W^E f \j2^BZ SU) (j) 



where < a < 1 is a discount factor. This particular scenario 
has been considered in [3], and the optimal strategy for this 
scenario is the following. 

1) If channel / was selected at time slot j — 1, then we get 
the updated pdf ff using equations (1) and (2), based 
on the sensing result zi(j — 1). For other channels, we 
let f- = f-~ ,Vi ^ l,i 6 Af. That is we only update 
the pdf of the channel which was just accessed (due to 
the independence assumption). 

2) For each channel, we calculate an index using the 
following equation 



max 



where T is the set of strategies for the equivalent 
One-Known-Channel selection problem (with channel i 
having the unknown parameter) and M is a random 
number corresponding to the last time slot in which 
channel i will be selected in the equivalent One-Known- 
Channel case. Aj is typically referred to as the Gittins 
Index [14]. 

3) Choose the channel with the largest Gittins Index to 
sense at time slot j. 



The optimality of this strategy is a direct application of 
the elegant result of Gittins and Jones [14]. Computational 
methods for evaluating the Gittins Index A could be found 
in [15] and references therein. 

B. Non-parametric Asymptotic Analysis and Asymptotically 
Optimal Strategies 

The optimal solution developed in Lemma 1 suffers from a 
prohibitive computational complexity. In particular, the dimen- 
sionality of our search dimension grows exponentially with the 
block length T. Moreover, one can envision many practical 
scenarios in which it would be difficult for the cognitive user 
to obtain the prior information f{6). In the remaining of 
this section, we analyze non-parametric schemes that do not 
explicitly use f(0), and thus the rules T considered in the 
following depend only on explicitly. We aim to develop 
schemes that have low complexity but still maintain certain 
optimality. 

For a given strategy T, the expected number of bits the 
cognitive user is able to transmit through a block with given 
parameters 8 is 

T N 

j=i »=i 

Recall that T(^>(j)) = i means that, following strategy T, the 
cognitive user should choose channel i in time slot j, based 
on the available information Here Pr{r(\I>(j)) = i} is 

the probability that the cognitive user will choose channel i at 
time slot j, following the strategy T. 

Compared with the idealistic case where the exact value of 6 
is known, in which the optimal strategy for the cognitive user 
is to always choose the channel with the largest availability 
probability, the loss incurred by T is given by 

T T N 

L(9: T) = Y / £ B ]T ftPr {r(*(j)) = l } , 

j=i j=i i=i 

where Of — max{#i, • • • ,8n}. We say that a strategy T is 
consistent if, for any 8 s [0, 1] N , there exists /3 < 1 such that 
L(9; T) scales as 1 0{T@). The following lemma characterizes 
the fundamental limits of any consistent scheme. 

Lemma 4: For any 6 and any consistent strategy T, we have 



i im inf M > B y 6i ;~ e \ . 



(5) 



where D(6i\\6i) is the Kullback-Leibler divergence be- 
tween the two Bernoulli random variables with parameters 
Si and 6i respectively: D(0i||0j) - ftln^/fli) + (1 - 

01) In ((1- 00/(1 -ft))- □ 

In this paper, we use Knuth's asymptotic notation: 1) gi(N) = o(g2(N)) 
means that Vc > 0,3AT , such that VN > No,gi(N) < cg2(N); 

2) gi(N) = u)(g2(N)) means that Vc > 0,3AT , such that V7V > 
N ,g2(N) < cg 1 (N); 3) gi(n) = 0(g 2 (N)) means that 3c 2 > c x > 
0,JV , such that V7V > N ,c 1 g 2 (N) < gi(N) < c 2 g2(N). 



Lemma 4 shows that the loss of any consistent strategy 
scales at least as w(lnT). An intuitive explanation of this 
loss is that we need to spend at least O(lnT) time slots on 
sampling each of the channels with smaller 8u in order to 
get a reasonably accurate estimate of 9, and hence use it to 
determine the channel having the largest to sense. We say 
that a strategy T is order optimal if L(9; T) ~ O(lnT). 

Now, the first question that arises is whether there exist 
order optimal strategies. As shown later in this section, we can 
design suboptimal strategies that have loss of order O(lnT). 
Thus the answer to this question is affirmative. Before pro- 
ceeding to the proposed low complexity order-optimal strategy, 
we first analyze the loss order of some heuristic strategies that 
may appear to be reasonable in certain applications. 

The first simple rule is the random strategy T r where, at 
each time slot, the cognitive user randomly chooses a channel 
from the available N channels. The fraction of time slots 
the cognitive user spends on each channel is therefore 1/N, 
leading to the loss 

-9i) 

L(9; IV) = i=1 T ~ 0(T). 

The second one is the myopic rule T g in which the cognitive 
user keeps updating f j (9), and chooses the channel with the 
largest value of 

9i = J e l p(e)de 

at each stage. Since there are no convergence guarantees for 
the myopic rule, that is 9 may never converge to 9 due to the 
lack of sufficiently many samples for each channel [16], the 
loss of this myopic strategy is 0(T). 

The third protocol we consider is staying with the winner 
and switching from the loser rule Tsw where the cognitive 
user randomly chooses a channel in the first time slot. In the 
succeeding time-slots 1) if the accessed channel was found to 
be free, it will choose the same channel to sense; 2) otherwise, 
it will choose one of the remaining channels based on a certain 
switching rule. 

Lemma 5: No matter what the switching rule is, 
L(9;T SW ) -O(T). □ 

Now, we present a linear complexity order optimal strategy. 

Rule 1: (Order optimal single index strategy) The cognitive 
user maintains two vectors X and Y, where each records 
the number of time slots in which the cognitive user has sensed 
channel i to be free, and each Y t records the number of time 
slots in which the cognitive user has chosen channel i to sense. 

1) Initialization: at the beginning of each block, each 
channel is sensed once. 

2) After the initialization period, the cognitive user obtains 
an estimate 9 at the beginning of time slot j, given by 9i(j) = 
Xi{j)/Yi{j), and assigns an index 



to the i channel. The cognitive user chooses the channel 
with the largest value of Aj(j) to sense at time slot j. After 
each sensing, the cognitive user updates X and Y. □ 
Lemma 6: The strategy specified in Rule 1 is order optimal. 

□ 

The intuition behind this strategy is that as long as Y t grows 
as fast as O(lnT), Aj converges to the true value of 8i in 
probability, and the cognitive user will choose the channel with 
the largest 0% eventually. The loss of O(lnT) comes from the 
time spent on sampling the inferior channels in order to learn 
the value of 9. This price, however, is inevitable as established 
in the lower bound of Lemma 4. 

IV. Multiple Cognitive Users Scenario 

The presence of multiple cognitive users adds an element 
of competition to the problem. In order for a cognitive user to 
get hold of a channel now, it must be free from the primary 
traffic and other competing cognitive users. More rigorously, 
we assume the presence of a set K, = {1, • • • , K} of cognitive 
users and consider the distributed medium access decision 
processes at the multiple users with no coordination. We 
denote ICi(j) C JC as the random set of users who choose 
to sense channel i at time slot j. We assume that the users 
follow a generalized version of the Carrier Sense Multiple 
Access/Collision Avoidance (CSMA-CA) protocol to access 
the channel after sensing the main channel to be free, i.e., if 
channel i is free, each user k in the set tCi(j) will generate 
a random number tk(j) according to a certain probability 
density function g, and wait the time specified by the generated 
random number. At the end of the waiting period, user k senses 
the channel again, and if it is found free, the packet from user k 
will be transmitted. The probability that user k in the set fCi(j) 
gains access to the channel is the same as the probability that 
tk{j) is the smallest random number generated by the users in 
the set fCi(j). Thus, the throughput user k achieves in a block 
is 

W k = BZ Sk {j) (j)l\k = arg min t q (j) ) , 

in which Sk(j) is the channel selected by the k th user at time 
slot j, and /(•) is an indicator function. 

Therefore, user k should devise sensing rule that max- 
imizes E{Wfc}. Clearly, even if 9 is known, it is not optimal 
anymore for all the users to always choose the channel with 
the largest 8i to sense. In particular, if all the users choose 
the channel with the largest #j, the probability that a given 
user gains control of the channel decreases, while potential 
opportunities in other channels in the primary network are 
wasted. 

A. Known 9 Case 

To enable a succinct presentation, we first consider the case 
in which the values of 9 are known to all the cognitive users. 
The users distributively choose channels to sense and compete 
for access if the channels are free. 



1 ) The Optimal Symmetric Strategy: Without loss of gener- 
ality, we consider a mixed strategy where user k will choose 
channel i with probability p/-,i. Furthermore, we let p,t = 
[pk,i, • • ■ ,Pk,N] and consider the symmetric solution in which 
p = pi = • • • = pk- The symmetry assumption implies 
that all the users in the network distributively follow the same 
rule to access the spectral opportunities present in the primary 
network, in order to maximize the same average throughput 
each user can obtain. The following result derives the optimal 
solution in this situation. 

Lemma 7: For a cognitive network with K > 1 cognitive 
users and N channels with probability 9 of being free, the 
optimal p* is given by 



ft = 



1 



1/(K-1) 



0, 



for 0i > 0, 
for 6i = 0, 



where A* is a constant such that Eft* ~ !• Here = 

max{0,a;}. □ 

The total throughput of the K cognitive users can be 
represented as 

KW = BKTj2h/K{l-(l-p*) K } 

= BTj20 t {l-(l- P *) K }. 

On the other hand, the average total spectral opportunities of 
the primary network are BT^Oi. This upper bound can be 
achieved by a centralized channel allocation strategy when 
K > N (simply by assigning one cognitive user to each 
channel). Therefore, the loss of the distributed protocol as 
compared with the centralized scheduling is 



L = BTj20i(l-p* 



K 



If the number of available channels in the network N is 
fixed and the number of cognitive users K in the network 
increases, we have the following asymptotic characterization. 

Lemma 8: Let 2 < Q < N be the number of channels for 
which 0i > 0. We have p* — ► l/Q, and L — > exponentially 
as K increases, i.e., L ~ 0(e~ ClK ), where 

ci = ln(Q/(Q - 1)). 

The reason for the exponential decrease in the loss is that, 
as the number of cognitive users increases, the probability 
that there is no user sensing any particular channel decreases 
exponentially. If Q = 1, there is no loss of performance, 
since all users will always sense the channel with non-zero 
availability probability. 

2) The Game Theoretic Model: The optimality of the 
distributed protocol proposed above hinges on the assumption 
that all the users will follow the symmetric rule. However, 
it is straightforward to see that if a single cognitive user 
deviates from the rule specified in Lemma 7, it will be able to 
transmit more bits. If this selfish behavior propagates through 
the network, it may lead to a significant reduction in the overall 
throughput. This observation motivates our next step in which 



the channel selection problem is modeled as a non-cooperative 
game, where the cognitive users are the players, the T^s are 
the strategies and the average throughput of each user is the 
payoff. The following result derives a sufficient condition for 
the Nash equilibrium in the asymptotic scenario K — > oo. 

Lemma 9: (Ti,--- , IV) is a Nash-equilibrium, if K is 
large and at each time slot, there are TiK users sensing channel 
i, where Ti satisfies T{ — Oi/^Qi- At this equilibrium, each 
user has probability J2$i/K of transmitting at each time slot. 

□ 

With this equilibrium result, the cognitive users can use 
the following stochastic sensing strategy to approximately 
work on the equilibrium point for a large but finite K. Let 
Sk(j) be the channel chosen by user k at time slot j. At 
each time slot, each user independently selects channel i with 
probability n = Oi/^du i.e., Pr{sk{j) = i} = n. Then at 
each time slot, the number of users sensing channel i will be 

K 

I{ s k{j) = £}> where the I{sk(j) = i}s are i.i.d Bernoulli 

fe=i 

random variables. Hence, the total number of users sensing 
channel i is a binomial random number, and the fraction of 
users sensing channel i converges to n in probability as K 
increases, i.e. 

E Hskti) = i} 
' fe=l 

T = K >n 

in probability. Hence, as K increases, the operating point will 
converge to the Nash equilibrium in probability. 

For any K, the probability that there is no user choosing 
channel i to sense is (1 — t£) k . Hence the performance loss 
compared with the centralized scheme is 



N 



l = BTJ2M1 - n) K = btJ2< 



K 



»=1 



EL' 



It is easy to check that 



lim 



k^oo exp 



-c 2 K 



BT6f 



where 6i* = min{#j : 9i > 0}, and 



c 2 = In 



0i* 



It is now clear that the loss of the game theoretic scheme 
goes to zero exponentially, though the decay rate is smaller 
than that of the scheme specified in Lemma 7. On the other 
hand, compared with the scheme in Lemma 7, the game 
theoretic scheme has the advantage that the cognitive users 
do not need to know the total number of cognitive users K in 
the network and, more importantly, they have no incentive to 
deviate unilaterally. 



B. Unknown 9 Case 



V. Conclusions 



Now, we consider the more practical scenario in which 6 is 
unknown to the cognitive users a priori. Hence, the cognitive 
users also need to estimate 6. Combining the results from 
single user case and multiple user with known case, we 
design the following low complexity asymptotically optimal 
strategy. 

Rule 2: 1) Initialization: Each user k maintains the follow- 
ing two vectors: X&, which records the number of time slots 
in which user k has sensed each channel to be free; and Yfe, 
which records the number of time slots in which user k has 
sensed each channel. At the beginning of each block, user k 
senses each channel once and transmits through this channel 
if the channel is free and it wins the competition. Also, set 
Xk : i = 1, regardless of the sensing result of this stage. 

2) At the beginning of time slot j, user k estimates f)j as 

O i (j) = X k>i (j)/Y k , i (j), 
and chooses each channel i G Af with probability 



After each sensing, X& and are updated. □ 
Lemma 10: If K is large, the scheme in Rule 2 converges 
to the Nash equilibrium specified in Lemma 9 in probability, 
as T increases. □ 
The intuition behind this scheme is that, each user will 
sample each channel at least 0(T) times, and hence as 
T increases, the estimate 6 converges to 6 in probability 
implying that the unknown 6 case will eventually reduce to 
the case in which is known to all the users. Hence, if K 
is sufficiently large, the operating point converges to the Nash 
equilibrium in probability. 

If one can assume that the users will follow the pre-specified 
rule, then we can design the following strategy that converges 
to the optimal operating point in probability for any K, as T 
increases. 

Rule 3: 1) Initialization: Same as Rule 2. 

2) At the beginning of time slot j < In T, user k estimates 

§i as 

di(j) = x k ,i(j)/Y k 4j), 

and chooses each channel i G j\f with probability 

For j > In T, the i th channel is sensed with probability 

i/(if-ip + 



Pi 



1- A*/0j 



(6) 



After each sensing, X& and are updated. □ 
Lemma 11: The proposed scheme converges in probability 
to the optimal operating point specified in Lemma 7, as T 
increases. □ 



This work has developed a unified framework for the design 
and analysis of cognitive medium access protocols. In the 
single user scenario, the optimal sensing strategy that balances 
the tradeoff between exploration and exploitation has been 
developed. A linear complexity cognitive medium access algo- 
rithm, which is asymptotically optimal as the number of time 
slots increases, has been proposed. The multi-user setting has 
also been formulated as a competitive bandit problem enabling 
the design of efficient and game theoretically fair medium 
access protocols. Our results motivate several interesting di- 
rections for future research, for example, developing optimal 
medium access strategies with consideration of sensing errors 
and other practical issues. Applying other powerful tools from 
sequential analysis to design and analyze wireless networks is 
a promising research direction. 
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